智能论文笔记

尖峰神经网络的事件驱动性质使它们具有生物学上可符合的和比人工神经网络更节能。在这项工作中，我们展示了二维视野中对象的运动检测。这里呈现的网络架构是生物学卓越的，并使用CMOS模拟泄漏整合和灭火神经元和超低功耗多层RRAM突触。具体的跨晶体管纤维Spice模拟表明，所提出的结构可以在二维视野中准确可靠地检测物体的复杂运动。

translated by 谷歌翻译

JEMMA: An Extensible Java Dataset for ML4Code Applications

Anjan Karmakar , Miltiadis Allamanis , Romain Robbes

分类：机器学习

2022-12-18

Machine Learning for Source Code (ML4Code) is an active research field in which extensive experimentation is needed to discover how to best use source code's richly structured information. With this in mind, we introduce JEMMA, an Extensible Java Dataset for ML4Code Applications, which is a large-scale, diverse, and high-quality dataset targeted at ML4Code. Our goal with JEMMA is to lower the barrier to entry in ML4Code by providing the building blocks to experiment with source code models and tasks. JEMMA comes with a considerable amount of pre-processed information such as metadata, representations (e.g., code tokens, ASTs, graphs), and several properties (e.g., metrics, static analysis results) for 50,000 Java projects from the 50KC dataset, with over 1.2 million classes and over 8 million methods. JEMMA is also extensible allowing users to add new properties and representations to the dataset, and evaluate tasks on them. Thus, JEMMA becomes a workbench that researchers can use to experiment with novel representations and tasks operating on source code. To demonstrate the utility of the dataset, we also report results from two empirical studies on our data, ultimately showing that significant work lies ahead in the design of context-aware source code models that can reason over a broader network of source code entities in a software project, the very task that JEMMA is designed to help with.

translated by 谷歌翻译

The Codex model has demonstrated extraordinary competence in synthesizing code from natural language problem descriptions. However, in order to reveal unknown failure modes and hidden biases, such large-scale models must be systematically subjected to multiple and diverse evaluation studies. In this work, we evaluate the code synthesis capabilities of the Codex model based on a set of 115 Python problem statements from a popular competitive programming portal: HackerRank. Our evaluation shows that Codex is indeed proficient in Python, solving 96% of the problems in a zero-shot setting, and 100% of the problems in a few-shot setting. However, Codex exhibits clear signs of generating memorized code based on our evaluation. This is alarming, especially since the adoption and use of such models could directly impact how code is written and produced in the foreseeable future. With this in mind, we further discuss and highlight some of the prominent risks associated with large-scale models of source code. Finally, we propose a framework for code-synthesis evaluation using variations of problem statements based on mutations.

translated by 谷歌翻译

人类在需要快速传达对象信息的游戏中显示出高级的抽象功能。他们将消息内容分解为多个部分，并以可解释的协议将它们传达。为了为机器提供这种功能，我们提出了基于原始的草图抽象任务，其目标是在预算影响下使用一组固定的绘图原始图表示草图。为了解决这项任务，我们的原始匹配网络（PMN）以自我监督的方式学习了草图的可解释抽象。具体而言，PMN将草图的每个笔划都映射到给定集中最相似的原始性，预测了仿射转换将所选原始词与目标冲程对齐的仿射转换。我们学习了端到端的这一笔触至关重要的映射，当原始草图精确地用预测的原语重建时，距离转换损失是最小的。我们的PMN抽象在经验上取得了素描识别和基于草图的图像检索的最高性能，同时也是高度可解释的。这为草图分析打开了新的可能性，例如通过提取定义对象类别的最相关的原始图来比较草图。代码可在https://github.com/explainableml/sketch-primitives上找到。

translated by 谷歌翻译